# Knowledge Distillation BERT
Prunedbert L12 H256 A4 Finetuned
A lightweight model based on the BERT architecture, pre-trained using knowledge distillation techniques, with a hidden layer dimension of 256 and 4 attention heads.
Large Language Model
Transformers

P
eli4s
16
0
Bert L12 H384 A6
A lightweight BERT model pre-trained on the BookCorpus dataset using knowledge distillation technology, with the hidden layer dimension reduced to 384 and 6 attention heads.
Large Language Model
Transformers

B
eli4s
16
2
Featured Recommended AI Models